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ESTIMATION IN ADDITIVE MODELS WITH HIGHLY OR 
NONHIGHLY CORRELATED COVARIATES 

By Jiancheng Jiang 1 , Yingying Fan 2 and Jianqing Fan 3 
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California and Princeton University 

Motivated by normalizing DNA microarray data and by predict- 
ing the interest rates, we explore nonparametric estimation of addi- 
tive models with highly correlated covariates. We introduce two novel 
approaches for estimating the additive components, integration esti- 
mation and pooled backfitting estimation. The former is designed 
for highly correlated covariates, and the latter is useful for nonhighly 
correlated covariates. Asymptotic normalities of the proposed estima- 
tors are established. Simulations are conducted to demonstrate finite 
sample behaviors of the proposed estimators, and real data examples 
are given to illustrate the value of the methodology. 

1. Introduction. The problem of estimating additive components with 
highly correlated covariates arises from the normalization of DNA microar- 
ray. Since the late 1980s, Affymetrix was founded with the revolutionary 
idea to use semiconductor manufacturing techniques to create GeneChips 
(an Affymetrix trademark) or generic DNA microarrays. It makes quartz 
chips for the analysis of DNA microarrays and covers about 82% of the 
DNA microarray market. A single chip can be used to do thousands of ex- 
periments in parallel, so it produces a lot of Affymetrix GeneChip arrays 
which demand proper normalization for removing systematic biases such as 
the intensity effects. 

Much research has been devoted to eliminating the systematic biases such 
as the dye, intensity and print-tip block effects. Examples include the rank- 
invariant selection method of Tseng et al. (2001), the lowess method of 
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Dudoit et al. (2002) and various information aggregation methods of Fan et 
al. (2005), Fan, Huang and Peng (2005), Huang, Wang and Zhang (2005) 
and Huang and Zhang (2005), among others. 

Fan et al. (2005), Fan, Huang and Peng (2005) propose a semilinear in- 
slide model (SLIM) to remove intensity effects and identify significant genes 
for Affymetrix arrays. Suppose that there are G genes and for each gene 
there are J replications ( J > 2). Let A g j and B g j be the log-detection signal 
of the gth probe set in the jth control and treatment arrays, respectively. 
Then, we compute the log intensities and log-ratios, respectively, as 

Xgj = iAgj + Bgj)/2> ^gj = Bgj ~ Agj. 

Fan et al. (2005), Fan, Huang and Peng (2005) use the following model to 
estimate the treatment effect, the smooth intensity effect: 

(1.1) Ygj=OLg + mj(Xgj)+Egj, Q = 1 , . . . , G] j = 1 , . . . , J, 

where a g is the treatment effect on gene g, mj(X g j) represents the array- 
dependent intensity effect to be estimated and %/'s are independent noises 
with zero means. For identifiability, we assume that E[rrij(X g j)] = 0. 

Directly estimating the treatment effects {a g } is not a good idea due to 
the existence of unknown intensity effects, as well as the small size J. In this 
paper we first treat {a g } as nuisance parameters and focus on the estima- 
tion of m, 's. Once a good estimate rhj of m, for each j is obtained, a g can 
be estimated as a g = jEj=iftj ~ ^ji-^gj))- Therefore, it is essential to 
efficiently estimate treatment effects {a 9 }. The setup applies to the c-DNA 
microarray data [Fan, Huang and Peng (2005), Huang and Zhang (2005)] 
and Agilent microarray data [Patterson et al. (2006)]. Moreover, it is also ap- 
plicable to other problems where confounding effects can nonparametrically 
be removed. 

Fan et al. (2005) used a backfitting algorithm to estimate iteratively the 
intensity effect and the treatment effect. While this method is successful 
for removing the systematic biases in some certain situations, mathematical 
properties of the resulting estimators are unknown which requires further 
study of the estimation. On the other hand, when performing the estimation 
method, we found that it is unstable and even fails to converge in some 
situations. A careful study of this problem reveals that it is caused by the 
high correlation between intensities. An illustrating example is the DNA 
microarrays data analyzed in Fan et al. (2005). In this example, the log- 
intensities across different chips are highly correlated, which is evidenced in 
Figure l(left), due to the repeatability and accuracy of the measurements. 
A close look at the almost identical relationship between covariates suggests 
that \X g i — X g 2\ —> 0. This suggests a simple working model for calibrating 
the following plausible correlation structure: 

Xgl = Xg 2 + bGU g 2, 
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Fig. 1. Left panel: highly correlated log intensities in Affymetrix array data with J — 2; 
right panel: highly correlated interest rates. {Xt} represents the weekly data of the 6-month 
treasury bill secondary market rates in the period of June 1, 1988 to June 1, 2008. 

where be — > and u g 2 is random noise. Under such a setting, the problem of 
effectively estimating the confounding effect {mj(-)} challenges statisticians. 
The high correlation reduces the accuracy of estimating rrij(-), but G in such 
an application is also very large, in an order of tens of thousands. 

The problem of highly correlated covariates appears often in modeling 
time series data such as interest rates. Suppose that we would like to use 
the past 4 weeks' (X t -i, . . . , Xt-4) interest rates to forecast the return of 
a stock or index Yt or the interest rate itself Y-t = Xt in the next week. A 
reasonable nonparametric model is the following additive model: 

Y t = fj, + mi(AVi) H h m 4 (AV 4 ) + e t . 

Due to the continuity of the interest rate dynamics, the covariates in the 
above additive model is also highly correlated and can be handled by the 
idea in this paper. Figure 1 (right) shows the scatter plot of X t versus Xt-i 
using the weekly data for the 6-month treasury bill secondary market rates 
in the period of June 1, 1988 to June 1, 2008. 

Existing methods in the literature do not appear enough to address the 
problem with additive modeling with highly correlated covariates, and a new 
methodology is needed. In particular, in addition to the aforementioned 
failure in convergence, the backfitting algorithm usually converges slowly 
due to the very large number of genes G which is usually in the order of 
tens of thousands in a typical microarray application. This motivates us 
to develop statistical methods fitting the smooth confounding effect model 
(1.1) with/without highly correlated intensity effects. 

The above model received attention in Fan et al. (2005), Fan, Huang and 
Peng (2005). However, there is no formal study of modeling highly corre- 
lated covariates X g j. For the usual correlation situation, Fan, Huang and 



4 



J. JIANG, Y. FAN AND J. FAN 



Peng (2005) considered the estimator of rrij using the profile least squares 
and obtained only an upper bound for the conditional mean squared er- 
ror of the estimator. However, information across arrays is not used, and 
the asymptotic distribution of the estimator is unknown which makes the 
inference about the intensity effects difficult. 

In this investigation, we introduce two methods for estimating the non- 
parametric components rrij, integration estimation and pooled backfitting 
estimation. The former is tailored for modeling highly correlated intensity 
effects and is a noniterative estimator with fast implementation. It relies on 
estimating the derivative function in a varying coefficient model, and allows 
us to handle a very large amount of observations. The latter is an itera- 
tive estimate which is designed for modeling nonhighly correlated intensity 
effects. Asymptotic normalities of the proposed estimators are established. 
The extent to which the high correlation affects the rates of convergence 
is explicitly given. Simulation studies are conducted to demonstrate finite 
sample behaviors of the proposed methods. 

The paper is organized as follows. In Section 2 we introduce the integra- 
tion estimation method along with an alternative of robustness. In Section 3 
we develop pooled backfitting estimation of the intensity effects. In Section 4 
we conduct simulations. In Section 5 we illustrate the proposed methodology 
by two real data examples. Finally we conclude the paper with a discussion. 
Details of assumptions and proofs of theorems are given in Appendices A 
and B. 

2. Estimation of additive components when covariates are highly corre- 
lated. To use information across arrays, one can take a difference operator 
to remove the nuisance parameters {a g } which leads to additive models. 

Specifically, let Y g = Y g \ — Y gk and s g = e g \ — e gk . Then by (1.1), for 
k — , . . . . J . 

(2.1) Y^=m x {X gl )-m k {X gk ) + ef\ g = l,...,G, 

which are additive models introduced by Friedman and Stuetzle (1981) and 

Hastie and Tibshirani (1990) where e g are the errors with zero means, and 

for j ^ k, Co\{e" g \e g k ' > ) = a 2 and Var(e^) = 2a 2 . The additive components 
can be estimated via the backfitting method. Due to the high correlation be- 
tween X g i and X g k, the estimate based on the backfitting algorithm usually 
fails in convergence, and the existence of a backfitting estimator is prob- 
lematic. Moreover, asymptotic properties of the backfitting estimators are 
unknown in this situation. Thus a new methodology is needed to deal with 
this problem. To this end, in the following we focus on the cases with highly 
correlated covariates and introduce the integration estimation and then es- 
tablish asymptotic normality of the resulting estimators under a working 
model. The estimators are consistent, regardless of the working model. 
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2.1. Estimation when covariates are highly correlated. As illustrated in 
the previous section, covariates X g j (for j = 1, . . . , J) may be very close and 
highly correlated, so it is convenient to assume that 

(2.2) A gk = X g i — X gk — > 0. 

Under such a setting, the asymptotic properties of the backfitting estimates 
are unknown, and the convergence of the backfitting algorithm may also be 
a problem since the required condition, that is, the existence of the joint 
density of covariates, is not always satisfied. See, for example, Opsomer and 
Ruppert (1997, 1998). Assume m'( is continuous; then by Taylor's expansion, 

TOi(X g i) = m 1 (X gk ) + mi(X 9 fc)A 9fc 

(2.3) 

+ 1 1 m'((X gk )A 2 gk + o(A gk ) 2 . 
Substituting (2.3) into (2.1), we obtain that 
(2.4) Yf > = ?n kl (X gk ) + m' 1 (X gk )A gk + , 

where m kl (X gk ) = mi(X gk ) -m k (X gk ) and if' = \m'{(X gk )A 2 gk + o(A gk ) 2 + 

e g k \ Model (2.4) is actually a varying coefficient model, since the coefficient 
functions m k i{-) and m'i{-) are unknown functions of X gk . This allows us 
to estimate the unknown coefficient functions m'^-) using local smoothing 
techniques. Given an interior point x E supp [/&(•)], using the local linear 
approximation when \X gk — x\ < h, we obtain that 

m kl {X gk ) + m[(X gk )A gk 

(2.5) 

w a + a.i(X gk -x) + A gk {/3 + Px{Xg k - x)}. 
Then the coefficient function m^(-) can be estimated by minimizing 



G 

x) 



,_^[Y g W - ao - ai (X gk 

9=1 

(2.6) 

- Ag k {(3 Q + f3 l (Xg k - X)}] 2 K h (Xg k ~ x) , 

where Kh(-) = /i _1 /^(- /h) with K{-) being a kernel function and h be- 
ing a bandwidth controlling the amount of data in smoothing. Denote by 
{a(x), f3(x)} with a(x) = (ao(x),ai(x)) and f3(x) = (/3o(x), (3i(x)) the so- 
lution to the above equation. Then /3q(x) and (3i(x) estimate m'^x) and 

m![{x), respectively. If A gk = o(l), then E[e g k ^\X gk = x] = o(l), and hence 
the above estimator is consistent. The method is noniterative and can han- 
dle the situation where G is very large. Once the derivative m'i(-) is given, 
the component m\ in model (2.1) can be derived as follows. 
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Let K = di&g{K h (X lk - x), . . . , K h (X Gk - x)}, §{x) = (d , di, /3 , /3i) T , 

= (l,Xg k ~ X,b G Ug k ,b G Ug k (Xg k ~ x)) 

and Z = (Zi, . . . , Zq) t . Then 9{x) admits the following closed form: 

(2.7) fl>) = (Z T KZ)- 1 Z T KY ((!) , 

where Y (fc) = (Y} k \ . . . , Y G k) f. Let 

9(x) = (mi k (x),m' lk (x),m' 1 (x),m'l(x)) T . 

Then 9{x) estimates 9{x), and m'^x) is estimated by rh[(x; k) = e^9(x) with 
e 3 = (0,0,l,0) T . 

Since averaging can reduce the variance of estimation, we propose to es- 
timate m'i(-) by the following average: 

J 

(2.8) m' 1 (x) = (J-l)" 1 ^m / 1 (x;A:). 

k=2 

Note that, for each k, rhi(x;k) is consistent. The estimator m^-) is also 
consistent. From the estimated derivative function, the original function 
mi (•) can consistently be estimated using integration which we now detail 
below. 

Let Fj(-) and fj(-) be, respectively, the distribution and density functions 
of Xgj. Due to the identifiability condition E[mi(X g x)] = and m\{x) = 
mi(xo) + j£ m'^t) ott(for any xq £ supp[Fi(-)]), we obtain that 

mi(x )+ I m' 1 (t)dt\dF 1 (x) = 

Jx J 

and hence mi(xo) = — J m'i(t)dtdFi(x). Therefore, m\(x) can be esti- 
mated by 

/* f*X f*X 

(2.9) rhi(x) = - / rh' l (t)dtdF 1 {x)+ I rh'^dt, 

J J XQ J Xq 

where F\ is the empirical estimator of F\. Note that the first term in (2.9) 
is a constant, making merely the estimated function to satisfy an empirical 
version of the identifiability condition. Similarly, we can estimate the other 
components' m/s (for j = 2, . . . , J) in model (2.1). Such denned estimators 
are naturally consistent due to consistency of the estimators of derivative 
functions. 
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2.2. Asymptotic normality. To provide in-depth analysis on the behav- 
ior of the estimators defined in (2.7)-(2.9), we model explicitly the high 
correlation among covariates. One viable choice is to employ the following 
working model: 

(2.10) X gl = X gk + b G u gk , 

where b G — > and {u gk }^ =l are noises of zero mean and finite variance. 
Assume that the density function of u g k, Pk(%), has a compact support and 
that {u g k} are independent of {X gk } for fixed k. This specification allows for 
heteroscedasticity of the errors. Obviously, in model (2.10) the correlation 
between X g \ and X gk goes to one as ba — > 0. There are various alternative 
methods for modelling high correlation between two variables. We focus only 
on model (2.10) to make an attempt. Note that the working model (2.10) is 
only used to derive the asymptotic properties. The estimator itself does not 
depend on such an assumption. 

Denote by fij(K) = Jt j K(t)dt and Vj{K) = Jt j K 2 (t)dt. Let H = 
diag(h, bah), S = diag(N,N), V = diag(i/,i/), C = diag(c 2 ,c 2 ) and c* = 
(c^,c^E(u 3 lk )) T where h = diag(l,/i), N = diag{/i (-FQ, ^(K)}, u = 
&\a,g{vo(K), u^(K)} and Cj = (fij(K), [ij + i(K)) T . The following theorems 
describe the asymptotic properties of the proposed estimators. 

Theorem 2.1. Suppose that the conditions in Appendix A hold. Under 
the working model (2.10), if Gh 5 = 0(1) and Ghb G = 0(1), then 

VGh{H[6(x) - 6{x)} - b(x)(l + op(l))} A A/"(0, E(x)), 

where 

b(x) = \h 2 S- l C{m'i k (x),b G mf\x)) T + ±b 2 G m'( (xjS^c* 
and S(x) = 2a 2 /f 1 (^)S" 1 VS- 1 . 

Corollary 2.1. Under the conditions in Theorem 2.1, 

VGhba{fh^{x\k) - mi (a) - &i(z)} A N{0,aj(x)), 

where 

h(x) = \h 2 ^ 2 {K)^\K)mf\x){l + op(l)) + \b G m'{{x)E(u\ k ){\ + o p (l)) 
and a 2 (x) = 2a 2 f{ 1 (x)u {K)n~ 2 {K). 

The above corollary shows that the data from two arrays suffice to obtain a 
consistent estimate of the derivative function. However, the high correlation 
reduces the effective sample size from G to Gb G , in terms of the rates of 
convergence. 
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In order to present asymptotics of the average estimator (2.8), we need 
the dependence structure of {u g k} across k. Let p(£,k) = E(u g £U g: k), which 
does not depend on g, and 



The above asymptotics of the estimators is derived under the working 
model (2.10). However, as previously stated, if condition (2.2) holds, our esti- 
mator for mi(x) is consistent whether or not the working model (2.10) holds. 
This furnishes robustness of our estimator rh'{x) against mis-specification 
of the correlation between covariates. If interested in estimating the deriva- 
tive function, one can directly compute the asymptotic bias and variance 
of rhi (x) and obtain the optimal bandwidth by minimizing the asymptotic 
mean square error so that a data-driven bandwidth selection rule can be 
developed as in the one-dimensional nonparametric regression problem. In 
the following we state the asymptotic normality of the integrated estimator. 

Theorem 2.3. Suppose that the conditions in Appendix A hold. Under 



the working model (2.10), if Gb 2 G h 4 = O(l) and Gb% = 0(l), then 
VGbainnix) - mi(x) - B 1 (x)(l + 0p (l))} A M(0,a 2 (x)), 



and hence the asymptotic normality of the estimator does not depend on the 
smoothing parameter h nor the kernel K. It parallels the result of Jiang, 
Cheng and Wu (2002) for estimating distribution functions and contrasts 
with the dependence on smoothing parameter of the nonparametric function 
estimation. 




Theorem 2.2. Under the conditions in Theorem 2.1 
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Remark 2.2. The estimate rh\{-) achieves a maximum convergence rate 
0{G^ 4 ) when b G = G^G" 1 / 4 ). The convergence rate can be improved if one 
uses a higher order polynomial approximation in (2.5). 

2.3. A pooled robust approach. In model (2.1), we aim at estimating 
mi(-). It has various versions of implementations. To illustrate the idea, 
we use aggregated local constant approximation along with the Li-loss to 
illustrate the versatility. For \X gk — x\ = O(h), we have m k i(X gk ) ~ mki(x) 
and m'^Xgk) ~ m'^x). Then, by (2.4), we can run the local regression by 
minimizing 

J G 

(2.11) " " Po\k\K h (X gk - x) 

k=2 g=l 

with Y g (k) = Y gl - Y gk , A gk = X gl - X gk and K h (-) = h- l K(-/h). Notice 
that we pool data from different replicates in (2.11) to obtain more accurate 
estimators, and the L\ norm is used to alleviate the influence of outliers. 
Denote by (&2,0) ■ • • > &Jfii A) 0*0) the solution to the above minimization prob- 
lem. Then $q{x) estimates m'^x). Integrating $q{x) leads to an estimate of 
mi(x). In our experience, this estimation approach performs similarly to the 
method in previous sections. 

3. Backfitting estimation of additive components. In this section, we 
introduce pooled backfitting estimators of rrij and study their asymptotic 
properties under nonhigh correlation situations. 

3.1. Fitting a bivariate additive model using the local linear smoother 
based on the backfitting algorithm. There are some methods for fitting the 
additive model (2.1). For example, the common backfitting estimation of 
Buja, Hastie and Tibshirani (1989) and Opsomer and Ruppert (1997, 1998), 
the marginal integration methods of Tj0theim and Auestad (1994), Linton 
and Nielsen (1995) and Fan, Hardle and Mammen (1998), the estimating 
equation method of Mammen, Linton and Nielsen (1999) and the smooth 
backfitting method in Nielsen and Sperlich (2005), among others. For illus- 
tration, we will use the common backfitting algorithm based on the local 
linear smoother as a building block to estimate the additive components. 
Other estimation methods can similarly be applied. 

To ensure identifiability of the additive component functions rrij(-), we 
impose the constraint E[mj(X g j)] = for j = 1, J. Fitting the addi- 
tive component rrij(-) in (2.1) requires choosing bandwidths {hj}. The op- 
timal choice of hj can be obtained as in Opsomer and Ruppert (1998). 
We here follow notation that was introduced by Opsomer and Ruppert 
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(1997). Put K h Ax) = h^Kif), K s (v) 



{mj(X 1:i ),. 



"3 

, m 



v K(v), Hj = diag(l, hj), m.j 



(X lj ,...,X Gj ) T andY fc = (y i ( 



Y^ k) ) T . 



' G 



-j( X Gj)} ,Xj 

The smoothing matrices for local polynomial regression are 

Sj = ( s j,*ij>---' s j,*G J ) T > 
where sj represents the equivalent kernel for the j'th covariate at the point 

Xj. 

(3-1) sj^ =e[(xfK, 3 Xg)- 1 xfK Ir 

Here ej = (1, 0), K Xj = <X\a,g{K h . (X^ - Xj),..., K h . (X Gj - Xj)} and 

'I (Xij-Xj) 



1 



From (2.1), m^'s can be estimated through the solutions to the following 
set of normal equations [see Buja, Hastie and Tibshirani (1989), Opsomer 
and Ruppert (1997)]: 

' v( fc ) 

where Si- = (I G — 11 T /G)Sj is the centered smoother matrix, and 1 is a G x 1 
vector whose elements are all ones. In practice, the backfitting algorithm 
[Buja, Hastie and Tibshirani (1989)] is usually used to solve these equations, 
and the backfitting estimators converge to the solution, 



Ig 






111 1 




'si 


.82 













(3.2) 



r ~ ( fc ) ~ 




Ig 


s;" 


-1 




.-m fc 




s* k 









Y (fc) for k = 2, . . . ,J, 



where the superscript in ihj is used to stress the dependence of rhi on k 

iacl 

HIE 

maxi<j< G X; 



If ||S^S^.|| < 1, then the backfitting estimators exist and are unique where 
we use ||A|| to denote the maximum row sum matrix norm of the square 
matrix A : || A ■ 
is 

fik(xi,x k ) 



l= i A sufficient condition for ||S|S^|| < 1 



(3.3) 



sup 

Xl,X k 



fi(xi)fk(xk) 



1 



<1, 



where fj(xj) is the density of Xj, and /ife(xi,Xfe) is the joint density of X g \ 
and X g k [see Opsomer and Ruppert (1997)]. We assume in this section the 
above condition holds. Note that this condition does not hold for the work- 
ing model (2.10) since the joint density of (X g i,X g k) is nearly degenerate. 
Solving (3.2), we get 



(3.4) 



m 



(k) 



{I G - (I G - S^n^G - S{)}Y^ = W lk Y 



(k) 



HIGHLY CORRELATED ADDITIVE MODELING 



11 



Since averaging can reduce the variance, we propose to estimate mi by 

J 



(3.5) m 1 = (J-l)- 1 X;mf ) , 

k=2 



which is termed as the pooled backfitting estimator of mi. For other com- 
ponents nij, they can be estimated in a similar way. Thus, in the following, 
we will focus on the estimation of mi. The integration method in the pre- 
vious section is simpler and much faster to compute since it uses only one 
smoothing parameter h and does not involve any iteration. 

To derive the asymptotic properties of rhi, in the following we introduce 
some notation in Opsomer and Ruppert (1997). Define 

D x ,hi ={t:{x + hit) £ supp(/i)} n supp(-FT). 

Then x is called "an interior point" if any only if D x ^ x =supp(-fT). Other- 
wise, x is a boundary point. Define the kernel Kh\[u) = K(u)/ (Aq(K), which 
is the asymptotic counterpart of the equivalent kernel induced by the local 
linear fit. Then ^(-KVn) = ^{K)^ (K) and z/o(-Km) = vq(K)(j,q 2 (K). Let 
Tj* fc be a matrix whose (i, j)th element is 

[Ti k ) ij = G- 1 {f lk (x 1 ,x k )fr 1 (x l )f- 1 (x k ) - 1}. 

Let tj represent the 5th row of (I — T^) -1 , and e g be the 5th unit vector. 

Theorem 3.1. Suppose that the conditions in Appendix A hold. If X g \ 
is an interior point, then as G — > 00 : 

(i) the bias of rh\(X g i) conditional on X = (Xi,Xfc) is 
E{rh x {X gl ) - mi(X 9 i)|X} = 61 - b 2 + O p + o p (j^ h j^j > 

where h = \hln 2 {K {1) )[m'{{X gl ) + {(tj - ej)m? - E{m'{{X gl ))}} and 

1 1 J 

b 2 = -MK(i))j^Y, h l^ E ( mf k(X gk )\X 1 ) - E{ml(X gk ))}- 

k=2 

(ii) the variance of rh\{X g i) conditional on X is 
Var{m!(AV)|X} = -J—^ a 2 f^\X gl ) VQ {K {1) ) 



J-lGhi J1 v S1 ' ur w ' p \Gh 
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As in Corollary 4.3 of Opsomer and Ruppert (1997), if the covariates are 
independent, the conditional bias of rhi(X g i) in the interior of supp(/) can 
be approximated by 

£{mx(AV) - mi (X gl )\X} = h 2 lf i 2 (K {l) ){m'((X gl ) - E{m'{{X gl ))} 



2 

+ o p (i/Vg) + 



3.2. Fitting a J-variate additive model using local linear smoother based 
on backfitting. In the previous section, we used the differences between any 
two different replicates for genes to eliminate the nuisance parameters. It 
resulted in two-dimensional additive models, which were easy to implement, 
but for each additive model, the estimator was asymmetric. In the following 
we use differences between any replicate and the average of those replicates. 
This will lead to a J-dimensional additive model with symmetric estimation. 

Let 

J J 

3=1 i=i 
and e g = J~ l ^2j = \ £gj- Then by (1.1) we have 

(3.6) Y g = a g + ffl(Xg) + Eg. 

Subtracting (3.6) from (1.1), we obtain that for j = 1, . . . , J, 
(3-7) Y* =~J2 m k{X gk ) + ^j^m^Xgj) + e* gj , 

where Y*- = Y g j — Y g and e* . = e g j — E g . It can be seen that Var(e* -) = 
(1 - l/J)cr 2 and Cov(e* -,eL) = for k. For any fixed j, let 

ml j {X g] ) = {J-l)J' 1 m ] (X gj ) 

and m* k -{Xgk) = —J~ 1 mi c (X g f t ) for k ^ j. Then (3.7) becomes 

J 

(3.8) Y g ) = ^ml^X gk ) + e* gj . 

k=l 

This is a J-variate additive model. Again, we can estimate the additive com- 
ponents using the local linear smoother based on the backfitting algorithm. 

Fitting the additive component rrij in (3.7) requires choosing bandwidths 
{hj}. The optimal choice of hj can be obtained as in Opsomer and Ruppert 
(1998) and Opsomer (2000). Put 

m^ = {m^(X lfe ),...,m^.(X Gfc )} T and Y* = (Y% , . . . , Y^f. 
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Then the additive components can be estimated through the solutions to 
the following set of normal equations: 



J 



°1 

Ig 



j 





* 








* 

m 2,i 












LOjJ 







riG 


Ig • 


• sj; 


-i 


-s;- 


Y* = M -1 CY| 






.8} 


S} • 


• Ig- 




c* 
L»j J 





where S* = (Ig — 11 /G)Sj is the centered smoother matrix, and Sj is de- 
fined the same as before. The backfitting estimators converge to the solution, 



(3.9) 



provided that the inverse of M exists. 

As in Opsomer (2000), we define the additive smoother matrix as 

W^EfcM^C, 

where is a partitioned matrix of dimension G x GJ with an G x G 
identity matrix as the fcth "block" and zeros elsewhere. Thus the backfitting 
estimator for m* k ■ is 



(3.10) 

Denote by mi- 



mator of m* is then rh* 



m^.=W fe Y*. 

and Wm = Yl 
W M Y*. Let W 



J 



M 



Wjfc. The backfitting esti- 
be the additive smoother 



matrix for the data generated by the (J — l)-variate regression model, Y' Q 
Y,l>=l,+k m t'A X 9k')+ £ 



HI 



If 



fe li 



90 ' 



< 1 for some k E {1,..., J}, by Lemma 2.1 of Opsomer 



(2000), the backfitting estimators exist and are unique, and 



(3.11) 



W* 



Ig - (Ig 
(Ig-S^ 



-fc]^ 



In this section we make the same assumption that is made in Opsomer 

(2000), that is, the inequality ||S£W^ fc] || < 1 holds. 

For each j, m£ • estimates m£ .. Define riifcj equals —Jm* k - for k^j and 

J(J — l) -1 !!^ for k = j. Then ihfcj estimates m^. Since the variance of 
"frfcj U 7^ ^) i s much bigger than that of m^fc, taking the average over j does 
not help reduce the variance of riifc^. We will use = rhk,k as an estimate 
of mfc. The following theorem is a corollary of Theorem 3.1 in Opsomer 
(2000). 
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Theorem 3.2. Suppose that the conditions in Appendix A hold. If X g \ 
is an interior point, then as G — > oo : 

(i) The conditional bias ofrhi(X g i) is 
E{m 1 (X gl )-m 1 (X gl )\X} 

=ej(/-sr< 1] r 1 
x r h 2 [V 2 mi _ E{ra!i)] _ S * B( ^ | + 0p{h 2 )} 

where B ( _ 1} = (W^~ 1] - I G )m ( _ 1) and m ( _ 1 ) = J2k=2 m k- 

(ii) The conditional variance of rhi{X g i) is 

Var{m 1 (X 3l )|X} = ^^L ( 7 2 /r 1 (^i)^(%)) + 0p (^). 

As in Corollary 3.2 of Opsomer (2000), if the covariates are mutually 
independent, the conditional bias of rh\ at an interior observation point X g \ 
is 



£{mi(X ffl )-mi(X g i)|X} 

] -h\[m'[{X gl ) - E{m'[{X gl ))} 



+ O p (l/VG) + o p (j2h] ) j 



This demonstrates that the estimators based on fitting bivariate additive 
models and a multiple additive model have the same asymptotic bias and 
variance in the interior points when the covariates are independent. However, 
the estimator based on fitting bivariate additive models is easy to implement. 

4. Simulations. We here conduct simulations to compare the perfor- 
mance of the proposed integration estimation method with the backfitting 
estimation. To this end, we consider model (1.1) and set J = 3 and G = 3000. 
The first variable X g \ is generated from a mixture distribution; that is, X g ± 
is simulated from the probability distribution 0.0004 x (x — 6) 3 /(6 < x < 16) 
with probability 0.6 and from the uniform distribution over [6, 16] with prob- 
ability 0.4. The other two variables X g k {k = 2, 3) are generated from model 
(2.10) with b G = G~ 7 and u gk ~ u . d . iV(0, 1) where 7 = 0.05,0.1 and 0.2 are 
used to control the correlation between X gk and X g \. It is easy to calcu- 
late that 7 = 0.05,0.1 and 0.2 correspond to correlations 0.9919, 0.9962 and 
0.9992 between X g \ and X g %, respectively. The correlations between X g i 
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and X g 3 are very close to the correlations between X g ± and X g 2 for different 
values of 7. The treatment effect a g is generated from the double exponential 
distribution ^exp(— \x\). The response variable Y kg is simulated from model 
(1.1) with mi(x) = v%(sin(x) - 0.2854), m 2 (x) = 0.01(x - ll) 3 - 0.2913, 
m 3 (x) = 0.2exp(x/5) - 3.0648 and e kg ~ u .d. N(0, 1). 

The mean square error (MSE) is employed to evaluate the performance 
of different estimation methods. The MSE of an estimate rhj of the func- 
tion rrij and the MSE of an estimate a = . . . ,&g) T of the vector a = 
(«i, . . . ,«g) T are defined, respectively, as follows: 

1 G 

MSE(mj) = - YSfrj&ai) ~ m j{X gj ))\ 
1 G 

MSE(a) = -J2(a g -a g ) 2 . 
9=1 

The integration estimation procedure and the pooled backfitting method 
are applied to estimate mj(-) at 100 equispaced grid points over the interval 
[6, 16] using 500 simulated datasets. For the backfitting method, we first 
tried the Gaussian kernel and the optimal data-driven bandwidth rule in 
Opsomer and Ruppert (1998) and noticed that the estimated curves for the 
backfitting estimators were over-smoothed when 7 is smaller. Following the 
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reviewers' suggestions, we then used a smaller bandwidth, that is, 0.4 times 
the optimal bandwidth. For the integration method, its performance is not 
sensitive to the choice of bandwidth, as long as it is not chosen too large 
(see Theorem 2.3). Thus we just chose a reasonably small one. The medians 
of the fitted curves over 500 simulations are summarized in Figure 2. It is 
seen from Figure 2 that, when 7 becomes larger, the correlation between 
covariates gets higher and the backfitting method performs worse while the 
integration method becomes better. In fact, when 7 = 0.2, our integration 
procedure gives almost perfect estimates of the true function: very little 
bias is involved. Similarly, we estimate the functions 7712(2;) and 7773(2;). The 
estimated curves are depicted in Figures 3 and 4. It can be seen that due 
to the high correlation, the pooled backfitting method gives estimates that 
are highly biased while our integration method produces almost perfect fits. 
The variations of the estimates are accessed by MSE, and the median of 
these 500 MSEs can be found in Table 1. 

Now we estimate a g ,g = 1, . . . , G. For each of the 500 simulated data sets, 
let a g j = Y g j — rhj(Xgj), for g = 1, . . . , G and j = 1, . . . , J. Then for each of 
the simulated data sets we estimate a g as 

1 J 

"<7 = j ^ ^93 • 

The performance of a is evaluated by MSE. The median of the 500 MSEs 
is then calculated. Table 1 reports the medians of MSEs obtained by using 




Fig. 3. The same as Figure 2 except that the estimated function is 
m 2 (x) = 0.01(3 - ll) 3 - 0.2913. (a) 7 = 0.05; (b) 7 = 0.1; (c) 7 = 0.2. 
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Fig. 4. The same as Figure 2 except that the estimated function is 
m 3 (x) = 0.2exp(x/5) -3.0648. (a) 7 = 0.05; (b) 7 = 0.1; (c) 7 = 0.2. 



the integration and pooled backfitting methods. The integration estimation 
method dominates the backfitting method in almost all cases. 

5. Real data example. 

5.1. Microarray data analysis. We apply our new estimation methods 
to the Neuroblastoma data set collected and analyzed by Fan et al. (2005). 
Neuroblastoma is the most frequent solid extra cranial neoplasia in children. 
Various studies have suggested that microphage migration inhibitory factor 
(MIF) may play an important role in the development of neuroblastoma. 
To understand the impact of MIF reduction on neuroblastoma cells, the 
global gene expression of the neuroblastoma cell with MIF-suppressed is 
compared to those without MIF suppression using Affymetrix GeneChips. 



Table 1 

Medians of MSEs for the estimated nij and a 





Intef 


jration estimation method 


Pooled backfitting method 


7 = 0.05 7 = 0.1 


7 = 0.2 


7 = 0.05 


7 = 0.1 


7 = 0.2 


mi 


0.1471 


0.0698 


0.1032 


0.0774 


0.2411 


0.8169 


ni-2 


0.0121 


0.0177 


0.0689 


0.2310 


0.1746 


0.2343 


1713 


0.0202 


0.0245 


0.0750 


0.1007 


0.0754 


0.1254 


Q 


0.3542 


0.3565 


0.3647 


0.3963 


0.4125 


0.4962 
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Among extracted detection signals, only genes with all detection signals 
greater than 50 were considered, resulting in 13,980 genes in three control 
and treatment arrays, respectively. The details of the design and experiments 
were given by Fan et al. (2005). 

For this DNA microarray data set, J = 3 and G = 13,980. Model (1.1) 
was used in Fan et al. (2005) to assess the intensity and treatment effects on 
genes with rrij(-) representing the intensity effect for the j'th array and a g 
denoting the treatment effect on gene g. As discussed in Section 1, model 
(1.1) leads to the additive model 

(5.1) Y^= mi {X gl )-m k {X gk ) + ef\ & = 2,3, 

where Y g k ^ = Y g \ — Y gk . Now we fit the data using model (5.1) and estimate 
the components by the integration and pooled backfitting methods. The re- 
sulting estimates indicate similar forms of the intensity effects for different 
slides, as presented in Figure 5. However, the integration and pooled back- 
fitting estimates differ substantially which raises a question about which 
estimate is more reliable. 

In the implementation of the backfitting method, we encounter an almost 
singular matrix problem when using the Matlab software due to the highly 
correlated log intensities X g j , which leads to the extremely low rate of con- 
vergence and unreliable results, even though it reports the final estimates. 
Hence, by intuition and the previous theory the integration estimation is 
better. In addition, since both estimation methods lead to roughly linear 
forms of the intensity effects functions nij(-) for j = 1,2,3, the linear model 
seems plausible. This suggests that we should fit the data using the linear 
model 

Y g W = P + foXgl + foXg k + 4 fc ) , 




2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 



X1 X2 X3 

Fig. 5. Fitted regression curves as estimates of the intensity effects for different arrays. 
Left panel: for the first array, middle panel: for the second array, right panel: for the 
third array; dashed (black): the backfitting estimate, dashed-dotted (blue): the integration 
estimate, solid (red): the linear regression. 
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Table 2 

The standard deviation of residuals from different estimation for the additive models 

with different covariates 



Covariates in models 


Integration estimation 


Pooled backfitting 


LSE 


(X 1 ,X 2 ) 


0.430 


0.451 


0.429 


(X!,X 3 ) 


0.421 


0.462 


0.428 


(X 2 ,X 3 ) 


0.375 


0.455 


0.455 



as an alternative of model (5.1). For the integration estimation method we 
do not have the singularity nor convergence problem. Thus we still work 
with model (5.1). 

Figure 5 displays the estimated functions for each array. The pooled back- 
fitting estimates are almost flat and deviate far away from the trends re- 
vealed by the least squares estimates (LSEs) for the linear model, but the 
integration estimates share similar trends as the LSEs. It seems that the 
estimated intensities from the integration method are increasing and have 
a similar trend which suggests that the intensity effects for the three slides 
are similar. Table 2 reports the standard deviation of residuals from the 
different estimation methods. It favors the integration method. 

5.2. Interest rate data analysis. In this subsection, we analyze the in- 
terest rates data introduced in the Introduction. For simplicity, we consider 
the model with two additive functions 

X t = fi + mi(X t -i) +m 2 (X t _ 2 ) +e t , 

Note that the above model is exactly model (2.1). Thus backfitting and 
integration methods can be used to estimate the additive components. Our 
integration method can be easily extended to the case where there are three 
or more additive functions. Figure 6 shows the estimated functions m\{x) 
and rri2{x) by using the integration and pooled backfitting methods. Figure 7 
shows the corresponding residuals, which demonstrates that the integration 
method provides much better fitting than the pooled backfitting method. 
Failure of the latter method is the result of highly correlated covariates [see 
also Figure 1 (right)] in the fitted model. 

6. Discussion. In this article we have proposed several estimation meth- 
ods for additive models when its covariates are highly correlated and non- 
highly correlated. We derived asymptotic normality of the proposed estima- 
tors and illustrated their performance in finite samples via simulations. The 
performance of the proposed methodology was also demonstrated by two 
real data examples. 

Many problems remain open for the array-dependent model. Examples 
include: 
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o 5 10 o 5 10 

Fig. 6. Left panel: the estimated curve for mi(x); right panel: the estimated curve for 
m2(x). 

(i) Investigation of the asymptotic normality of the backfitting estima- 
tors when the covariates are highly correlated. 

(ii) Establishing the asymptotic distribution of the estimators in (2.11). 



Integration method 

1 r 




Backfitting 

4r 




_ 4 i 1 , 1 1 1 

2 4 6 8 10 

Fig. 7. Top panel: residual plot when the integration method is used; bottom panel: resid- 
ual plot when the backfitting method is used. 
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(iii) Test if the nonparametric functions rrij have certain parametric forms. 
The generalized likelihood ratio tests can be used [see Fan and Jiang (2005, 
2007)]. 

APPENDIX A: CONDITIONS 

(i) The kernel K{-) is a continuous and symmetric function and has 
compact support, and its first derivatives had a finite number of sign changes 
over its support. 

(ii) The densities of fjS are bounded and continuous, have compact sup- 
port and their first derivatives have a finite number of sign changes over their 
supports. Also, fj(xj) > for all Xj S supp(/j). 

(iii) As 67 — > oo, hj — >■ 0, h — > 0, Ghj/logG — > oo and Gh/\ogG— > oo. 

(iv) The second derivatives of rrij exist and are continuous and bounded. 

APPENDIX B: PROOFS OF THEOREMS 

Proof of Theorem 2.1. Let Y (fc) = {Y^ k \. . .,Y G k) ) T , 

rn g (X gk ) = m kl (Xg k ) + b G Ug k m' 1 {X gk ) and iii = (rhi(X lk ), . . . ,fh G (X Gk )) T . 
Then (2.4) becomes 

Y (fc) =m + e( fc ). 

By (2.7), we have 

§(x) - 6(x) = (Z T KZ)" 1 Z T Ke( fe ) + (Z T KZ)- 1 Z T K[m - Z9(x)} 

(B.l) 

= V(x) +B(x). 
Note that for \X gk — x\ < h, 

rh g (X gk ) = m k i(x) +m' kl (x)(X gk - x) 

+ Im'i^x^Xgk - xf + o(X gk - xf 
+ bcUgkim'^x) + m'{{x){X gk - x) 

+ \mf\x){X gk - x) 2 } + o(X gk - xfb G 
= Z g r 6(x) + \ml l {x){X gk -x) 2 

+ \mf' \x)b G Ug k (X gk - x) 2 + o(h 2 + hb G ), 
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uniformly for g = 1, . . . , G. Let 

(X lk - x) 2 h~ 2 (X lk -x) 2 h~ 2 u lk 



X 



{X Gk -x) 2 h 2 (X Gk -x) 2 h 2 u Gk 



Then 



h 2 . 



(B.2) mi* - Z6(x) = yX 



m ki\ x ) 
(x)b G 



+ o(l)(h 2 + b G h 2 ), 



where 1 is a G x 1 vector with all elements being l's, and hence 

h 2 



(B.3) BO) = (Z' J KZ)^Z T K { X 



hb G m'{{x) 



+ o(l)(h 2 + h 2 b G ) }. 



Let S T = Z T KZ. Then S T = Y,g =1 K h (X gk - x)Z g Z' g , and the (i, j)th ele- 
ment of St is 

( G 



y^K h (Xg k - x)(Xg k ~ X) 



i+j-2 



for 1 < i,j < 2; 



9=1 
G 



^K h {X gk -x){X gk -x) i+j 4 b G u gk , for z = l,2;j = 3,4; 



9=1 
G 



^^(X gfc - x){X gk - x) l+j - 4 b G u gk , for t = 3,4; j = 1,2; 



9=1 
G 

Y,K h (X gk - x){X gk - xy + j-%u 2 gk , for i,j = 3,4. 

^ 9=1 

Directly computing the mean and variance, we obtain that: 

(i) for l<i,j<2, 

G 

g-^-w-Qstm = G-^KhiXgk - x)(x gk - x y + i- 2 h-^-v 

9=1 

= f k (x)pL i+j - 2 (K) + O p (h + 1/VGh); 

(ii) for i = l,2 and j = 3, 4, or i = 3, 4 and j = 1, 2, 

G- 1 fc-( <+3 '- 4 >&c 1 S T ,i i 

G 

= G" 1 ^ 1^(A> - x)(X gk - x)^- A h^+^u gk 
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(iii) for i,j = 3,A, 

G 

= G~ l K h (X gk - x){X gk - x?*-«h-W-*>u 2 gk 



9=1 



Therefore, 
(B.4) 



f k {x)iH+j-a{K) + O p (h + 1/VGh). 



G^H^StHT 1 = f k (x)S + O p (U T )[ h + 



<Gh 



By simple algebra and (B.2), we have 

G _1 H _1 Z T K[m lfc - Z6(x)] 



G~ 1 H~ 1 Z r K 



x < 



X 



r h 2 



+ o(l)(h 2 + b G h) 



h 2 

yKi( x ) 
hbcm'Kx) 



+ o(h 2 + b G h), 



uniformly for components where A = (Aij) is a 4 x 2 matrix with 

G 

G- 1 Y, K h (X gk - x)h~^\X gk - x) i+ i, for i = 1, 2 and 



— < 



3=1 
G 



G 1 Y J K h(X gk -x)h \Xgk-xyUgk, fori = 1,2 and 



9=1 
G 



G 1 Y K h{X g k- x)h 2 {X gk -x) 2 u gk , fori = 3,j = l; 



9=1 
G 



x)u gk , 



G-^KhiXgk-xJh^iXgk 

9=1 
G 

G- 1 ]T K h {X gk - x)h- 3 (X gk - xfu gk 



9=1 
G 



9=1 



x ) 3u gk> 



for i = 3, j = 2; 
for i = 4, j = 1; 
for i = 4, j = 2. 
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Directly computing the mean and variance of A^, we obtain that A 
fk{x)C + o(h 2 + boh), uniformly for components. Then 



G -1 H -1 Z T K[mi fc - Z6(x)] 



fk(x) 
h 2 



112 

^3 

111 

L /i 3 



h 2 

^b G mf\x) 
,(3) 



+ o(h 2 + b G h) 



/k(i)CK(i), m^(x)b G y + o(h 2 + boh), 



uniformly for components. Thus 



HB(x) = HSy 1 Z T K[mi fc - Z0(x)] 



(H- 1 SrH- 1 ) -1 H~ 1 Z T K[mi fc - Z6(x)} 
2 



^S- 1 CK 1 (x),mf ) (x)6 G f(l + 0p (l)). 



This combined with (B.l) yields that 



H(0» - 6{x)) - '-^-S~ 1 C(m / l 1 (x),b G mf\x)) T (l + o p (l)) = HV(x) 



2 

(B.5) 

By the definition of V(x) we have 



HV(x) = HSy 1 Z T KeW = (H- 1 S T H- 1 )- 1 H- 1 Z T Kg 



~{k) 



Plugging (B.4) into the right-hand side above, we establish that 



HV(x) = G- 1 (/ fc (x)S)- 1 



f- 1 (x)S- 1 J G (x) 



1 

Xik - x 
h 

X 1fc - x 



-Ulk 



1 

X Gk - x 
h 

Ugk 

X Gk ~x 



h 



-UGk 



K 



(k) 

e G 
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where 



Jg(x) 



G 



9=1 



° ( 



Xgk-X 



G 



9=1 

G 

G-^KhiXgk-x) 

9=1 



gk x)Ug k £g ^ 



Under the working model (2.10), we obtain from (2.4) that 



(B.6) 



^ = x 



(Xg k )b 2 G U 2 gk (l+O p (l))+E 



(*) 



Using an argument similar to that for Lemma 7.3 of Jiang and Mack (2001), 
we can show that 

VGh[UV(x) - i6|m' 1 '(x)S- 1 c*(l + 0p (l)) + O p {b 2 G /V~Gh)} 
^N(0,2f k 1 (x)a 2 S- 1 VS- 1 ), 

which together with (B.5) and fi(x) = fk(x)(l + o(l)) leads to the result of 
the theorem. □ 

Proof of Corollary 2.1. Let e 3 = (0,0,1,0) T . Then 
rh'i{x; k) — m'i{x) 

= el(9(x)-6{x)) 



■ h 2 



ef Y U- 1 S- 1 C(m'l 1 (x),b G mf\x)) T (l + o p (l)) 

+ elf k -\x)n- 1 S- 1 J G (x). 
It is easy to verify that e^H^S" 1 = (0, 0, b G }^ l {K), 0). Then 
rh'i(x; k) — m'i(x) 



(B.7) 



^-^(K)^\K)mf\x)(l + o p (l)) 

G 

+ f-\x)^\K)b- G l G- 1 Kh(X gk - x)u 

9=1 



(k) 
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This combined with the asymptotic normality of Jg{ x ) completes the proof 
of the corollary. □ 

Proof of Theorem 2.2. By (B.7), 
rh'i(x) — m'i(x) 

= ^^(K)^\K)mf{x){l + o p (l)) 

J G 

+ JTi E fkH^oH^G- 1 J2 K h (X gk - x)u gk ef\ 

k=2 g=l 

Then using (B.6), we obtain that 

rh'i(x) — m'i(x) 
h 2 



(B.8) 



2 -^{K)^\K)mf\x)(l + o p {l)) 
+ h G m'{{x)E{u\ k ){l + o p {l)) 



J G 

+ J^T,fk 1 (x)»oHK)b G 1 G- 1 ^2K h (X gk - x)u gk e^ 

k=2 g=l 



Let B G (x) = f k \x)^ \K)b- G l G~ 1 Z% K h {X gk -x)u gk e g k) . Then 

E[B G {x)} = 0. Note that E(u gkl u gk2 ) = p{h,k 2 ) and E{e g kl) e ( g 2) } = a 2 for 
k\ 7^ k 2 and 2a 2 for k\ = hi . It follows that 

E[VGhb G B G (x)} 2 

^ ' fci,fe 2 =2 

x E{hK h (X gkl - x)K h (X gk2 - x) 
xE{e g k ^ef^)E(u gkl u gk2 )} 

2 J 

^/ fc - 2 (x)/x - 2 (^)£;{^ 2 (X gfe - x)}<r 2 p(fc, k) 



(J-l) 



fc=2 



+ (7^1)2 E 4>)/^>K 2 W 



E{hK h (X gkl - x)K h (X gk2 - x)}a 2 p(k 1 ,k 2 ). 
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Using f k (x) = fi(x)(l + o(l)) and 

E{hK h (X gkl - x)K h {X gk2 - x)} = E{hK 2 h (X gl - x)}(l + o(l)) 

= f 1 (x)v Q (K)(l + o(l)), 

we arrive at 

E[VGhb G B G (x)} 2 = pa 2 f^(x)^ 2 (K)u (K)(l + o(l)), 

where p = jj^j Efc=2 *0 + Efc 1= 2 Efc 2 =2 p( fc i ' fc 2 )] • Therefore, \/g7i&g x 
Bq{x) is asymptotically normal with mean zero and variance cr 2 (x). This 
together with (B.8) completes the proof of the theorem. □ 

Proof of Theorem 2.3. Observing that 



mi(x) = -G^ 1 ^/ rh' 1 (t)dt + m[(t)dt 

= -G~ 1 ^2 [m[(t) -m' 1 (t)]dt + [m'^t) - m[(t)] dt 

i J in Jin 



g=l Jx Jx 
G 

+ mi(x)-G- 1 ^mi(l 9 i) 

9=1 

and G" 1 Y^=\ m i{ x gi) = O p (G~ 1/2 ), we obtain that 
rh\{x) — m\{x) 



„ . A k ) r x 

algebra, 

rh\{x) — mi(x) 



-G-'Y, / [<(t)-m[(t)}dt 

9=1 7x0 

+ r[m' 1 (t)-mi(t)]di + O p (G'- 1 / 2 ) 

G nf£ 

G- 1 ^ / [mi^-m'^dt + O^G- 1 ' 2 ). 



Let J fciG = G 1 Ylg=i u gk£g f Xgl K h{X gk - t) dt. Then by (B.8) and simple 



^-^{K)^\K)[m'{{ X ) - Em'{(X n )](l + 0p (l)) 
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+ i& G £(«? fc )K(x) - Em'^Xnm + o p (l)) 

j 



k=2 



Let 



1 J 

k=2 

x u ak4 k) [ K h(X gk - t) dt. 

1 J Xn^ 



Then 

E[VGb G C G {xf 

J 



J2f-\x)nv 2 (K)El f K h {X gk -t)dt\ a 2 p(k,k) 



y ' ki^k 2 

xe\I K h {X gkl -t)dt 



9l 



Since 



j K h (X gkl -t)dt j K h (X gk2 -t)dt\ 

(.JXgl J Xgl ) 

2 



E^£ K h (X gl -t)dt} (l + o(l)) 

/oo ( rx ^2 

h(u)dulj K h (u-t)dt\ 



E[VGb G C G (x)] 2 = \f^ 2 (x)a 2 p + o(l). 



Therefore, V Gb G C G (x) is asymptotically normal with mean zero and vari- 
ance a 2 {x). □ 
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Proof of Theorem 3.1. As in Opsomer and Ruppert (1997), we let 



Qmifal) 



(X G1 - Xl ) 2 ] 



dmi(xi) 



S LXiiQm.i(Xn) 



S LX G1 Qmi( X Gl) 



and similarly for Qm k (%k) and Let h 2 = h 2 l. Then by the proof of 
Theorem 4.1 of Opsomer and Ruppert (1997), 



(Ig-sjsj;) 



Sl)m k = m k + i(I G - S;S£) _1 S;Q fc + o p (h 



and 



(I G - SJSJ)- 1 ^ - Si)mi = m a - i(I G - SJS^-^i + o p (h 2 ), 
where = (I G - ll T /G)Qi. Thus, 

(B.9) E(&! - mi|X) = |(I G - SJSD-^QJ - S?Q fe ) + o p (h 2 + h 2 k ). 

By (3.4), m^Wym + Wye?. Note that Var(4 i) |X) = 2a 2 I G and for 

j ^ k, Cov(4 J) ,4 fe) l x ) = ^ 2l G- It follows that 



Cov{m[ j) (X gl ),m\ K> (X gl )\X} 



(*), 



(B.10) 



and 



^{1 - e£ (I - S^Sp-^I - SDe s - e£ (I- SJS^-^I - SJ)e fl 

+ e J(i - s^sp-Hi - SJ)(I - s;) T (i - S^)^} 



Var{?ni j) (X 9l )|X} 
(B.ll) =2a 2 {l-2eJ(I-StS*)- 1 (I 



Si)e s 



+ ej(l - SjSp-^I - SJ)(I - S1) T (I - StS*)" T e,}. 

Using the same argument as that for (10) in Opsomer and Ruppert (1997), 
we obtain that for j, k = 2, . . . , J (j ^ k), 



Cav{m?{X gl ),m^{X gl )\X) = -^/f '(^iHW + 



,(*) 



1 



Ghi 



1 



Ghi 



and 



Var(mS j) (X 5l )|X) = ^-^/f '(^i)^) + o f 



Gh 



Gh, 



Therefore, by (3.5), 



Var(mi(Z gl )|X) = (J - 1)~ 2 Cov(m^(X 3l ),mf } (X 9l )|X) 

j,fc=2 



J 



1 



J-lGhi 



a 2 f^(X gl )u (K)+o p 



Gh! 
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The conditional bias of mi is obviously the sum of biases for each m\ 
(k = 2, . . . , J). This completes the proof of the theorem. □ 

Proof of Theorem 3.2. The result can be proved along the line of 
Theorem 3.1 in Opsomer (2000). □ 

Acknowledgments. The authors thanks the Associate Editor and the 
referees for constructive comments that substantially improved an earlier 
version of this paper. 
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